Card et al.

mentions 1 type Person feed RSS

// recent coverage 1 mentions

06:32

2026-06-24

dev.to

machine-learning

Bootstrap confidence intervals for your LLM eval metrics

Nexus Labs' fine-tuning and evaluation team lead demonstrated that a single evaluation metric like 84.2% accuracy on a 500-example set carries significant uncertainty, with a 95% bootstrap confidence …

// co-occurs with top 3 entities

Nexus Labs 1 Dror et al. 1 scipy 1